An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets

نویسندگان

  • Renqing Li
  • Shihai Wang
چکیده

Software faults could cause serious system errors and failures, leading to huge economic losses. But currently none of inspection and verification technique is able to find and eliminate all software faults. Software testing is an important way to inspect these faults and raise software reliability, but obviously it is a really expensive job. The estimation of a module’s fault-proneness is important to minimize the software testing resources required by guiding the resource allocation on the high-risk modules. Consequently the efficiency of software testing and the reliability of the software are improved. The software faults data sets, however, originally have the imbalanced distribution. A small amount of software modules holds most faults, while the most of modules are fault-free. Such imbalanced data distribution is really a challenge for the researchers in the field of prediction for software faultproneness. In this paper, we make an investigation on software fault-prone prediction models by employing C4.5, SVM, KNN, Logistic, NaiveBayes, AdaBoost and SMOTEBoost based on software metrics. We perform an empirical study on the effectiveness of these models on imbalanced software fault data sets obtained from NASA’s MDP. After a comprehensive comparison based on the experiment results, the SMOTEBoost reveals the outstanding performances than the other models on predicting the high-risk software modules with higher recall and AUC values, which demonstrates the model based on SMOTEBoost has a better ability to estimate a module’s fault-proneness and furthermore improve the efficiency of software testing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

Using Source Code Metrics and Ensemble Methods for Fault Proneness Prediction

Software fault prediction model are employed to optimize testing resource allocation by identifying fault-prone classes before testing phases. Several researchers’ have validated the use of different classification techniques to develop predictive models for fault prediction. The performance of the statistical models are proven to be influenced by the training and testing dataset. Ensemble meth...

متن کامل

A Statistical Framework for the Prediction of Fault-Proneness

Accurate prediction of fault prone modules in software development process enables effective discovery and identification of the defects. Such prediction models are especially valuable for the large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This paper presents a methodology for predicting fault prone m...

متن کامل

Empirical Studies to Predict Fault Proneness: A Review

Empirical validations of software metrics are used to predict software quality in the past years. This paper provides a review of empirical studies to predict software fault proneness with a specific focus on techniques used. The paper highlights the milestone studies done from 1995 to 2010 in this area. Results show that use of machine learning languages have started.This paper reviews works d...

متن کامل

A systematic and comprehensive investigation of methods to build and evaluate fault prediction models

This paper describes a study performed in an industrial setting that attempts to build predictive models to identify parts of a Java system with a high fault probability. The system under consideration is constantly evolving as several releases a year are shipped to customers. Developers usually have limited resources for their testing and would like to devote extra resources to faulty system p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JSW

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014